Goto

Collaborating Authors

 assumption 4






Supplementary Material for Understanding and Improving Ensemble Adversarial Defense

Neural Information Processing Systems

They are used to test the proposed enhancement approach iGA T. In general, ADP employs an ensemble by averaging, i.e., (C 1) ( C 1) Adversarial examples are generated to compute the losses by using the PGD attack. Our main theorem builds on a supporting Lemma 2.1. We start from the cross-entropy loss curvature measured by Eq. The above new expression of T (x) helps bound the difference between h(x) and h(x). Note that these three cases are mutually exclusive.




A Proofs of Linear Case Throughout the appendix, for ease of notation, we overload the definition of the function d

Neural Information Processing Systems

The proof of this lemma requires Lemma A.1, which characterizes the distribution of the residual By Pinsker's inequality, this implies d By Lemma A.1, we have E[ X ( null w w The proof is inspired by Theorem 11.2 in [20], with modifications to our setting. First, we construct a "ghost" dataset The most challenging aspect of the ReLU setting is that we do not have an expression for the TV suffered by the MLE, such as Lemma 4.2 in the linear case. The proof of this Lemma, as well as other Lemmas in this section, can be found in Appendix B.1. Using Lemma B.2 and Lemma B.3, we can form a uniform bound, such that all A straight forward combination of Lemma 4.3 and Lemma B.4 gives the following Theorem. Now we can apply Bernstein's inequality (Theorem 2.10 of [8]).